An Embedding-Based Topic Model for Document Classification
نویسندگان
چکیده
Topic modeling is an unsupervised learning task that discovers the hidden topics in a collection of documents. In turn, discovered can be used for summarizing, organizing, and understanding documents collection. Most existing techniques topic are derivatives Latent Dirichlet Allocation which uses bag-of-word assumption However, bag-of-words models completely dismiss relationships between words. For this reason, article presents two-stage algorithm modelling leverages word embeddings co-occurrence. first stage, we determine topic-word distributions by soft-clustering random set embedded n -grams from second document-topic sampling each document distributions. This approach distributional properties instead using assumption. Experimental results on various data sets Australian compensation organization show remarkable comparative effectiveness proposed classification.
منابع مشابه
A New Document Embedding Method for News Classification
Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...
متن کاملA link-bridged topic model for cross-domain document classification
0306-4573/$ see front matter 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.ipm.2013.05.002 ⇑ Corresponding author at: Department of Computer Science, South China University of Technology, Guangzhou, China. Tel.: +852 39438461; f 26035505. E-mail addresses: [email protected] (P. Yang), [email protected] (W. Gao), [email protected] (Q. Tan), [email protected] (K.-F. Wong)...
متن کاملResearch on Food Complains Document Classification Based-on Topic
In this paper, we design a classifier based-on topic for food complain documents, and take a series of measures to the implementation process. In order to accomplish feature reduction, the filter method named term filtering for independent topic features is proposed to compress each topic feature vector. We introduce the created food ontology as background knowledge and to expand the semantic o...
متن کاملAn Ensemble Click Model for Web Document Ranking
Annually, web search engine providers spend more and more money on documents ranking in search engines result pages (SERP). Click models provide advantageous information for ranking documents in SERPs through modeling interactions among users and search engines. Here, three modules are employed to create a hybrid click model; the first module is a PGM-based click model, the second module in a d...
متن کاملAn Automatic Approach for Document-level Topic Model Evaluation
Topic models jointly learn topics and document-level topic distribution. Extrinsic evaluation of topic models tends to focus exclusively on topic-level evaluation, e.g. by assessing the coherence of topics. We demonstrate that there can be large discrepancies between topicand documentlevel model quality, and that basing model evaluation on topic-level analysis can be highly misleading. We propo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM Transactions on Asian and Low-Resource Language Information Processing
سال: 2021
ISSN: ['2375-4699', '2375-4702']
DOI: https://doi.org/10.1145/3431728